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Abstract. The m-sophistication of a finite binary string x is introduced 
as a generalization of some parameter in the proof that complexity of 
complexity is rare. A probabilistic near sufficient statistic of x is given 
which length is upper bounded by the m-sophistication of x within small 
additive terms. This shows that m-sophistication is lower bounded by 
coarse sophistication and upper bounded by sophistication within small 
additive terms. It is also shown that m-sophistication and coarse sophis- 
tication can not be approximated by an upper or lower semicomputable 
function, not even within very large error. 
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Introduction 

The Kolmogorov complexity of a finite binary sequence is a measure for the 
amount of structure in a finite discrete sequence. Sophistication [1117] is a mea- 
sure to quantify the complexity of this structure. It is shown here that sophisti- 
cation and its introduced variant m sophistication is related to three important 
questions in the field of statistics and computability. 

— If the Kolmogorov complexity K(x) is low for some binary finite sequence 
x, than X can be interpreted as "deterministically" generated, and "non- 
deterministically" generated otherwise. The structure function [16119122] de- 
fines for each x a function of a natural number k to the logarithm of the min- 
imal cardinality of x containing sets. If the structure function decreases for 
low k to the value K{x) — k, these sequences are called "positively random" . 
Positive randomness is satisfied with high probability if x is "stochastically" 
generated. Such x allow a useful definition of frequentistic probabilities sat- 
isfying the Kolmogorov probability axioms. 
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— A sumtest for a computable semimeasure is an abstraction of a statistical 
significance test for a simple hypothesis JT^ . It can be argued that for many 
composite hypotheses, a theoretical ideal statistical test is given by a sumtest 
for a lower semicomputable semimeasures (4) . The question rises whether for 
the lower semicomputable semimeasure unbounded sumtests exists in some 
computability class. It turns out that for the hypotheses of independence 
there are no unbounded computable and lower semicomputable sumtests, 
but there are upper semicomputable sumtests of maximal magnitude l{x) [7]. 
There are also no computable or lower semicomputable sumtests for a uni- 
versal semimeasure, but there are upper semicomputable sumtests of mag- 
nitude log/(x) — O (log log /(x)) 12:. The proof relies on the observation that 
the introduced m-sophistication for a universal semimeasure m, is within 
logarithmic terms a sumtest for m. 

— The coding theorem justifies the approximation of the logarithm of a univer- 
sal semimeasure by data-compression heuristics |10llll2l] . The hypothesis 
of a timeseries x being influence-free of another timeseries y corresponds 
to a universal online semimeasure |4l9j . Also the approximation of such a 
semimeasure is related to online complexities |4I9) . The error in such a cod- 
ing result is given by m-sophistication [315) . 

Overview and results. The paper uses definitions and observations from [5] 
and basically runs through the proof of the theorem that high complexity of 
complexity is rare as in [13] , see also |12I14I19] . m-sophistication is a generaliza- 
tion of a parameter used in this proof. It allows some simple observations related 
to the questions above. Let k be the m-sophistication of a finite sequence x. It is 
shown that the amount K{x) of information in x can be decomposed as k bits of 
Halting information and K(x) — k bits of additional information, within 2 log k 
error terms. The first k bits of the Halting probability compute an approximate 
sufficient statistic for x. It is shown that within 0(log k) terms m-sophistication 
is larger than coarse sophistication, and smaller than sophistication. Finally it is 
shown that m-sophistication and coarse sophistication define within logarithmic 
terms a sumtest relative to the universal semimeasure, and that they have no 
lower and upper semicomputable approximation, not even within large error. 

Definitions and notation. For an introduction to Kolmogorov complexity and 
computability is refereed to [14119] and for extensive specialized background to 
[12120] ■ Let w be the set of natural numbers. The binary strings 2^'^ of finite 
length can be associated with ui. Let l{x) denote the length of x in its binary 
expansion. Let 2" and 2^" be the sets of strings x with l{x) — n, and l{x) < n. 
Let w^"^ be the set of finite sequences in uj. The Real numbers in [0, 1] are 
associated with Cantor spac£0. For r G 2", r'^ denotes rir2...rfe. For x £ 2^"^, x'' 
denotes xiX2...Xk. 

A semimeasure P is a positive Real function that satisfies ^{P{x) : x G w} ^ 
1. A semimeasure P (multiplicatively) dominates a semimeasure Q, notation: 

^ This association is not bijective since the Real O.aOlll... equals the Real O.alOOO... 
for any a G 2^", however, this omission does not cause problems. 
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P ^* Q, if a constant c exists such that for all x: cP{x) ^ Q{x). P —* Q, iff 
P ^* Q and Q ^* P. A set 5 of semimeasures has a universal element m if 
m & S and m dominates all semimeasures in S. Let /, g be functions depending 
on parameters x and n. f dominates g (notation: / ^+ g), iff there is a constant 
c which satisfies for all x and n: J{x,n) + c ^ g{x,n). c may depend on any 
parameter except x, n. f =+ g iff / ^+ g and 5 ^+ /. 

Let <?(.|.) represent a fixed optimal universal Turing machine, that is prefix- 
free in its first argument. <?t(p|x) y means that (p on input p,x outputs y, 
and halts in less than t computation steps. A Real function f : lu [0,1] is 
computable if there is a p S such that for all k,x: <l>{p\x,k) 1= /(x)*^. An 
enumeration of a Real function f{x) is a computable real function g{x,t) such 
that for all t: g{u,t) ^ g{u,t + 1) and such that liuit^k g{u,t) — f{u). A lower 
semicomputable function / is a function that has an enumeration. A function / 
is upper semicomputable if — / is lower semicomputable. With abuse of notation, 
an enumeration of / is denoted as ft. 

Kolmogorov complexity and its properties. For x,y uj^'^ , let the Kolmogorov 
complexity be 

Kt(x) ^ min{/(p) : 'Pt{p\y) i= x} 
K{x) = lini Kt{x). 

For all n G uj: K{n) ^+ log n + 2 log log n and for all x S 2": K{x) ri-|-21ogn. 
Let X* represent the lexicographic first program that produces x. 

K{x, y) =+ K{y) + K{x\y*) K{y) + K{x\y, K{y)). 

A Halting program can also output its own length, therefore 

K{x) K{x,K{x)). 

The coding theorem shows that 

Qp{x) = ^{2-'(rf : <l>{p) i= x} (1) 
QKix) = 2-^(-) (2) 

define universal semimeasures. This implies that for any universal semimeasure 
m: — logm(x) =+ K{x). 

1 Halting probability and a Buzzy Beaver variant 

In computability theory, the number f2 is typically defined as the prior proba- 
bility that some universal prefix- free Turing machine halts |8ll3j . Here a closely 
related concept is studied: the probability that a universal semimeasure is de- 
fined. 
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Definition 1. Let m be some universal semimeasure. 

x<t 

lim fit 

t— >co 

The original definition in [8'13' is obtained by choosing m — Qp, as in equation 
[T] Hq^ satisfies the following well known theorem. 

Theorem 1. For all n: K{C2q^) ^+ n. There is a constant c such that for all 
n, the Halting of any program p G can be decided by f2"^'^. 

It will be shown later in this section that these properties of fiq^ remain for 
general i7„i with a similar argument. Let a, b represent objects or tuples of objects 
in 2^'^ (w) that possibly depend on the parameters n or x. It is said that "a 
computes 6" (notation: a — > b) iff there is a constant c that for all values of the 
parameters x and n: K{b\a) ^ c. For a,/3 G 2", the relation a" — > f3^ defines 
a partial order on 2", which is equivalent with the 'domination' relation in |18j . 
i^Qp is stable with respect to the choice of universal machine Let to <P' be 
two optimal universal prefix-free Turing machines and let Qp and Q'p be defined 
as in equation [U than it is easily observed that 

^Qp ^ — ^ ^Q'p- 
An other example of such a relation is 

where Qk is defined in equation [31 It is an interesting question whether the 
opposite direction also holds. 

Following the proof that high K{K{x)\x) is rare in [I^, the times i„ are 
defined. Fix some universal semimeasure to, and let for each n: 

t„ = mm{t : Q^a - ^m,t ^ 2""}. 

It is easily observed that 

Lemma 1. 

O" < \ n f 

Lemma 2. Let t\p\ — minf{(/)i(p) |}. For any universal m, there is a constant 
c such that for any halting p G 2^".' 



^(P) < tl{p)+c 
t[p] < tl(p)+c- 
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Proof. Let n — l{p) + c + 1 with c large enough and suppose that l{p) ^ c + 2. 
Let X e 2^'^ be the lexicographic first string with — logmt„(a;) ^ l{x) ^ 2Z(p). 
Suppose that (j){p) ^ i„, than p — > p, n — > x and thus 

-logm(x) ^+ K{x) «;+ l{p). 

which implies for c sufficiently large 

n-nt„^ m{x) - mt„{x) ^ 2-'(P)~" - 2"2Kp) > 2-'(p)-"-\ 

contradicting the definition of The second claim follows by remarking that 
for every Halting p: p — > t[p]. □ 

The Buzzy Beaver function is defined by: 

BB{n) = max{^(p) ; l{p) < n]. 

Lemma [3] shows that t„ is a very fast growing function that oscillates between 
BB{n) and BB{n + 2 log n% 

Lemma 3. There exists a constant c such that: 

BB{n ~c) ^ta < BB{n + 2\ogn + c). 

Proof. The left inequality follows from Lemma [5] By Lemma [T] 

K{t,,) ^+ K{n:;J i^+n + K{n), 

The witness of K{tn) shows the right inequality. □ 

Corollary 1. For all universal semimeasures m, m' there is some constant c 
such that 

tn < +2 log n+c) 

with tn and defined by m and m' . 
Proof. 

tn < + 2l0gn + C) < C+21ogn+2c 

□ 

A Real number a £ 2" is random if for any n: K{a") ^+ n. It follows by 
Lemma [3] that 

Corollary 2. f2„i is random. 

Proof Since n ^+ K(tr,) «:+ -ftr(/2™). □ 

By Corollary [T] it follows that 
^ Remark that analogue bounds as in Lemma [S] can be proved. 
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Lemma 4. for m, m' universal semimeasures 
Proof. 

nn , „ f > „ fl , ^n-21og 



□ 



The question rises whether the set of ah for some universal semimeasures has 
a maximal element relative to the — > order. Remark that it is shown in 18 that 
the set of all fim with m universal corresponds to all computable enumerable 
random Real numbers. 

Finally it can be asked whether these logarithmic bounds are tight. Some 
remarks are made in relation to this question. For a random a G 2" only a small 
amount of values K{a") is allowed: 

n (a") ^+ n + 2 log n. 

It is well known that _ft'(a") oscillates within these bounds. 

Lemma 5. For any random a e 2" there are an infinite amount of n such that 

K{a'') ^+ n + 21oglogn, 
and there are an infinite amount of n such that 

K{a") ^+ n + logn. 

See appendix for the proof. 

2 m-sophistication and complexity of complexity 

Definition 2. For some universal semimeasure m, and some c £ uj, the m- 
sophistication an x G 2^" is given by: 

kc(x) = min{fc : Kf^{x) ^ K{x) + c}. 

kc{x) is limit-computable in x, but not lower semicomputable or upper semicom- 
putable by Proposition [T] From Corollary [T] it is observed that kc is relatively 
stable with respect to changes of universal semimeasure m. 

Corollary 3. Let m, m' be universal semimeasures and let k and k' be the m- 
sophistication and m' -sophistication, then for any c: 

fee s;+ fc;^ + 21ogfc^. 

As for sophistication (see further) , also m-sophistication is unstable with respect 
to the parameter c. 
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Lemma 6. There is a c' such that for all c there are infinitely many x with 

kc{x) — kc+c'{x) n — 2 log 71. 

See appendix for the proof. 

By the coding theorem, a definition very related to m-sophistication is given 
by (m, m)-sophistication: 

k'(x) = min{fc : 2}. 

Lemma 7. For any c large enough: k' ^ kc- 

Proof. By some time-bounded version of the coding theorem: 

-^*fc'(x)+c(2^) "log™*A,'(.)(^) -logm(a;) =+ K{x). 

□ 

High (rn, m)-depth is rare. 

Lemma 8. For any k and Sk = {x : k'{x) ^ k}: 

m{Sk) < 2-'=+!. 

Proof. 

^m{Sk) ^ m{Sk) - mt.iSk) ^ O - Qt, ^ 

□ 

Lemma 9. Let k{x) he either k'{x) or kc{x) for any c, than: 

K{K{x)\x) k{x) + 2logk{x). 
Proof. Remark that t^^^, x — > K{x), thus 

K{K{x)\x) ^+ if (ife(,)) ^+ K{n^^-y) ^+ k{x) + 21ogfc(x), 
where the last inequality follows from Lemmas [T] and [5] □ 
Corollary 4. There exists a constant c > such that 

m{{K{K{x)\x) ^ k}) ^ c2-''^-2iogfc^ 

A sumtest d for a semimeasure P is a function d : 2<" Z such that 

P(x)2''(^' s; 1. 

Corollary 5. For k = k' and for k — kc with c large enough, k — 2 log k defines 
a sumtest for m. 
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Proof. 

J2 m(x)2'='(^)-2i°8^'(-)-2 <; ^TO(5fe)2'=-2iogfe-2 ^ ^2-2i°gfc-i <c 1 

□ 

fee and k' are not computable, and not even a logarithmic lower bound can be 
computed. 

Proposition 1. For k = k' and for k = kc with c large enough, k can not be 
approximated by a lower or upper semicomputable function within k — 2 log k + 
0(1) error. 

See appendix for the proof. 

3 Sophistication and coarse sophistication 

Let / be a computable function. A function f -sufficient statistic |15) is a com- 
putable prefix-free function g such that there exists a d 6 g~^{x) with 

K{g) + l{d)^K{x) + f{l{x)). 

The sophistication [T7] of x G 2^'^ is given by: 

kl°^^{x) = mm{K{f) : / is a c-sufficient statistic of x}. 

Remark that there is a slight deviation from jl7l23[ since it is also required 
that / is prefix-free. This is necessary to interpret sophistication as the length 
of a minimal sufficient statistic [15] . Also remark that now Lemma [10] is true. 
Let bb(x) be the inverse of the Buzzy Beaver function, it is bb{x) — minjfc : 
X ^ BB{k)}. It is a very slow growing function, dominated by any unbounded 
non-decreasing function [Tj. 

Proposition 2. There exists a c' such that for all c, x: 

k,+,,{x) ^+ kl°'^'\x)+bb{x). 

Proof. The right inequality follows by observing that any function /, witnessing 
the definition of sophistication defines a description of x of length K{x) -I- c -f c', 
for some c' large enough. Let d = minjd : f{x) = d}, let 

M = BB{bb{x)) ^ x ^ d, 

and let p be the program that evaluates /(e) for all e ^ M. Let s be the 
computation time of this computation. Remark that Ks{x) ^ K{x) + c + c' and 
thus 

^ ^ *fcc+c'(^) ^ BB{kc+c' ~ c) 
for some c' large enough, by Lemma [3] This implies 

kc+c' K{s) «C+ l{p) K{f) + bb{x) «C+ fc'^^P^x) + bb{x). 

□ 
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A probabilistic /-sufficient statistic is a computable probability distributiorlfl 
P such that 

K{P) ~ log P{x) ^ K{x) + f{l{x)). 

Since prefix-free functions are used here, probabilistic and function sufficient 
statistics are equivalent. 

Lemma 10. There is a constant c such that every probabilistic f -sufficient 
statistic P defines a function (/ + c)-sufficient statistic g with abs{K{P) — 
K(g)) ^ c, and every function f -sufficient statistic g defines a probabilistic 
(/ -|- c)-sufficient statistic P with abs{K{P) — K{g)) ^ c. 

Proof. The first claim is proved in [23j . It remains to show the second claim. Let 
g be the function /-sufficient statistic and let 

P{x) = ^{2-'('*' : g{d) =xAds^x]. 

Remark that P{x) = if there is no d ^ a; with g{d) = x. It follows that 
— logP(x) ^ l{d) for the witness d of a; in the definition of the function /- 
sufficient statistic of g. Remark that K{g) ^+ K{P), and therefore the conditions 
of the definition of (/ + c)-sufficient statistic arc fulfilled. □ 

Let 

Pk{x)^N2-\mt,ix)-mt,_Ax)), 

Where TV is a normalization constant such that Pk defines a computable prob- 
ability distribution. Remark that 2 ^ < 4. Also remark that this can be 
considered as the probabilistic equivalent of the "explicit minimal near sufficient 
set statistic" described in [15] . 

Lemma 11. For m = Qk: 

K{x\n'''^'='>) ^+ K{x)-k'{x). 

Proof. Remark that since m — Qk, for any k either mt^{x) ~ mt^_-^^{x) or 
mt^{x) = 2mt^_j^(x). This implies that Pk'{x){x) ~ 2^^*^^)^^. The Lemma fol- 
lows by Shannon-Fano coding. □ 

To relate Pk to sophistication, it is shown that it defines some /-sufficient 
statistic. 

Proposition 3. There exists a c such that Pk'{x) is a probabilistic (2 log k'{x) -\- 
c)-sufficient statistic for x. There exists a c such that for any d , there is a 
k fcc'(x) .such that P^ is a (31ogA:c(a;) -|- c + c')- sufficient statistic for x. 



^ A probability distribution is a semimeasure with X^^^gt^ ~ 1 
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See appendix for the proof. 

The onhne coding theorem [5] relates the logarithm of a universal online 
semimeasures (causal semimeasure) to online Kolmogorov complexity. The online 
coding theorem has an error term, which is improved for the length-conditional 
case in |3I5) . In the proof of the improved online coding theorem, an online com- 
putable semimeasure is associated with Pk'{x)- It is shown that the value of the 
logarithm of the universal online semimeasure and the associated semimeasure 
for X equals within a 0(logfc'(a;)) term. Since the associated semimeasure is 
computable, a variant of Shannon-Fano code can be applied. 

In [6| it is shown that the result of Proposition can not be further improved 
to eliminate the logarithmic terms in order to consider Pk as a probabilistic 
c-sufficient statistic. It is shown that minimal sufficient statistics contain a sub- 
stantial amount of non-Halting information. The proof seems to imply that in 
contrast with m-sophistication, sophistication does not define a sumtest. How- 
ever, it is shown in [6 that Pk defines a minimal typical model |24j . 

Sophistication is unstable with respect to the parameter c, therefore in [T] 
coarse sophistication is defined as 

c 

As a corollary of Proposition [3] it follows that: 
Corollary 6. 

fc"^°Ph(x) fc'(2;) + 21ogfc'(2;). 

Proposition 4. k'^^°'^^(x) — A\ogk'^^°'^^{x) defines a sumtest for m. k'^^°v^ can 
not be approximated by a lower or upper semicomputable function within k — 
2 log k + 0(1) error. 

Proof. This follows from Corollary [5] and the same proof as[T] □ 

Acknowledgments The author is also grateful to M.Li and P.Vitanyi, for the 
book [19] , without such a good introductory and reference book this work would 
never have appeared. 

References 

1. L. Antunes and L. Fortnow. Sophistication revisited. Theor. Comp. Sys., 
45(1):150-161, 2009. 

2. B. Bauwens. Co-enumerable sumtests for the universal distribution. Submitted, 
2008. 

3. B. Bauwens. Ideal hypothesis testing and algorithmic information trans- 
fer, June 2009. Talk in Conference on Logic, computabiUty and randomness, 
[www ■ lif ■ univ-mrs . f r/lce/bauwens . pdf [ 

4. B. Bauwens. Influence tests I: ideal hypothesis tests and causal semimeasures. 
ArXiv e-prints, december 2009. 



m-sophistication 11 



5. B. Bauwens. Influence tests II: m-depth and on-line coding results. In preparation, 
2009. 

6. B. Bauwens. On the equivalence between minimal sufficient statistics, minimal typ- 
ical models and initial segments of the Halting sequence. ArXiv e-prints, November 
2009. 

7. B. Bauwens and S. Terwijn. Notes on sum-tests and independence tests. Accepted 
for publication in Theor. Comput. Sys., open access, 2009. 

8. Gregory J. Chaitin. A theory of program size formally identical to information 
theory. J. Assoc. Comput. Mach., 22(3):329-340, 1975. 

9. A. Chernov, S. Alexander, N. Vereshchagin, and V.Vovk. On-line probability, 
complexity and randomness. In ALT '08: Proceedings of the 19th international 
conference on Algorithmic Learning Theory, pages 138-153, Berlin, Heidelberg, 
2008. Springer- Verlag. 

10. R. Cilibrasi and P.M.B. Vitanyi. Clustering by compression. Trans, on Inform. 
Theory, 51(4):1523-1545, 2005. 

11. S. de Rooij and P.M.B. Vitanyi. Approximating rate-distortion graphs of individual 
data: experiments in lossy compression and denoising. Submitted, 2006. 

12. R. Downey and D. Hirschfeldt. Algorithmic randomness and complexity. To ap- 
pear. 

13. P. Gacs. On the symmetry of algorithmic information. Soviet Math. Dokl., 15:1477- 
1480, 1974. 

14. P. Gacs. Lecture notes on descriptional complexity and randomness. Tech- 
nical report, Comput. Sci. Dept., Boston, 1988-2010. Technical report, 
|http: //www, cs .bu. e du/f acTilty/gacs/papers/ait-notes . pdf 

15. P. Gacs, J. Tromp, and P.M.B. Vitanyi. Algorithmic statistics. IEEE Trans. 
Inform. Theory, 47(6):2443-2463, 2001. 

16. A.N. Kolmogorov. Complexity of algorithms and objective definition of random- 
ness. Uspekht Mat. Nauk, 29(4), 1974. Abstract of a talk at Moscow Math. Soc. 
meeting 4/16/1974, translation in M.Li and P.M.B. Vitanyi 2008, page 438. 

17. M. Koppel. The Universal Turing Machine: A Half-Century Survey, chapter Struc- 
ture, pages 435-452. R. Herken, Oxford University Press, 1988. 

18. A. Kucera and T. Slaman. Randomness and recursive enumerability. SIAM J. 
Comput, 31(1):199-211, 2001. 

19. M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and Its 
Applications. Springer- Verlag, New York, 2008. 

20. A. Nies. Computability and Randomness. Oxford University Press, Inc., New York, 
2009. 

21. B. Ryabko, J. Astola, and A. Gammerman. Application of Kolmogorov complexity 
and universal codes to identity testing and nonparametric testing of serial inde- 
pendence for time series. Theoretical Computer Science, 359:440-448, august 2006. 

22. N.K. Vereshchagin and P.M.B. Vitanyi. Kolmogorov's structure functions and 
model selection. IEEE Trans. Infor. Theory, 50(12):3265-3290, 2004. 

23. P.M.B. Vitanyi. Meaningful information. IEEE Trans. Inform. Theory, 
52(10) :4617-4626, 2006. 

24. P.M.B Vitanyi and M. Li. Minimum description length induction, Bayesianism, 
and Kolmogorov complexity. IEEE Trans. Infor. Theory, 46(2):446-464, 2000. 



12 Bruno Bauwens 



Appendix: Proofs of some lemmas and propositions 

Proof of Lemma [31 Let for any k n = Va'' such that n can be computed by a'^ 
and such that logn =+ k. Remark that for any z e 2""^^ one has 

K{z\a'') s:+ K{z\n -k) n-k, 

and consequently, 

K{a^) ^+ if (a'=)+XK.. 

k + 2logk + n-k. 

The second inequahty fohows from Exercise 3.6.3d in [TH]. □ 

Proof of Lemma El Kolmogorov complexity fluctuates "continiuously" , in the 
sense that there exists a constant e such that for all a, r K{r+\, a)—e ^ K{r, a) ^ 
K{r + l,a). Let e be such a constant large enough. Since K (t„-2iogn-c-2e) ^ 
n — c — e, there always exists an r such that: 

n - c - 2e K{r, t„_2 log n-c-2e) <n - c-e. 

Remark that r ^ n'^ can be chosen for n large enough. Let t = tn-2iogn-c-2e 
and let a; € 2" the lexicographic r-th string such that Kt{x) =+ n. Remark that 
such an x always exists by Lemma [T^ and 

t, r, n i — > X. 

This implies that for e large enough: 

71 — c — 3e ^ K{x) < n — c. 

Therefore 

c < Ktix) - K{x) ^c + ie. 

□ 

Lemma 12. For some computable function f large enough, and some constant 
c large enough, there are infinitely many n, such that the amount of x € 2"' with 



n 



c ^ Kf(^j^^{x) ^ n + c is larger than 2 



71 — 2 log n 



Proof. There are infinitely many m such that Kf(jn-^(m) ="*" K(rn). For any such 
m let n = K(m) + m. Remark that that there are 2™ many r G 2™ such that 
K{r\m*) r, with m* a shortest program for m. Let r' e 2" be r £ 2"^^'^™^ 
preappended with m* . This shows that r' < — > m, r and thus 

n = K{m) + m =+ K{m) + K{r\m*) =+ K{r'). 
Thus if j(2) („)(/) ^+ n. Also 

if/(2)(„)(r') ^+ ii:/(,„)(m) + if/(m)(r|m*) =+ n, 
for / large enough. □ 
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Proof of proposition [I] Suppose that the function d approximates k such 
that k ~ d ^ k — elogfc + 0(1) for some constant e. This imphes that d > 
elogfc — 0{1). Remark that this imphes by Corohary [5] that there exists a c' 
such that d — 4 log d — c' is a sumtest for m. 

By [7] every lower semicomputable sumtest for m is bounded by a constant, 
which implies that if d was lower semicomputable, than d 0, and thus only 
the constant e = is allowed. 

By [5] every upper semicomputable sumtest for ni is bounded by a \ogl{x) + 
0(log log ^(x)). Therefore, only the constant e = 1 is allowed. □ 

Proof of Proposition[3[ Remark that for any k: K{Pk) fc + 21ogfc. Choos- 
ing k = k'{x), and remarking that — \ogPf^n^^{x) K{x) — k'{x), proves the 
first claim. 

The second claim is now proved. By some time bounded version of the coding 
theorem there is a constant e such that: 

logm(a;) K{x) ^ Kt^^{x) + c ^+ -\ogmt^^^^^^^{x). 

Therefore 

m{x) SC* mt,^(^,^^ (x) 2^ ^{Pk{x) : fc ^ fc, + e}. 
This shows that there is a fc such that 

ofe 

m{x) «;* ^Pfc(x). 

By applying the coding theorem, and taking — log of the above equation one 
obtains: 

K{x) — logTO(a;) 

^+ fc — log fc — log Pk [x) 

^+ KiP)- Slog k~ log Pkix). 



Which shows that Pk is a (3 log fc + e')-sufficient statistic. Remark that e' ^ c+c' 
for some c' independent of c. □ 



